A combined priority scheduling method for distributed machine learning

نویسندگان

چکیده

Abstract Algorithms and frameworks for distributed machine learning have been widely used in numerous artificial intelligence engineering applications. A cloud platform provides a large number of resources at lower cost is more convenient method such With the rapid development containerization, native combinations based on Docker Kubernetes provided effective resource support learning. However, does not provide efficient priority or fair scheduling strategies computationally intensive time-consuming jobs, which easily leads to deadlock, waste, low job execution efficiency. Therefore, utilize order between multiple jobs as well dependencies tasks same job, considering intra- inter-group priorities, combined proposed Volcano. Considering user priority, task longest wait time, parallelism, affinity non-affinity parameter server worker nodes, model inter- intra-job proposed, mapped into strategy intra-group priorities pods, enabling training The experiment results show that achieves preferential allocation urgent, high high-priority with users improves anti-affinity settings among pods reduce time information interaction nodes certain extent, thereby improving completion This group alleviates problems deadlock waste caused by insufficient computing.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

On Model Parallelization and Scheduling Strategies for Distributed Machine Learning

Distributed machine learning has typically been approached from a data parallel perspective, where big data are partitioned to multiple workers and an algorithm is executed concurrently over different data subsets under various synchronization schemes to ensure speed-up and/or correctness. A sibling problem that has received relatively less attention is how to ensure efficient and correct model...

متن کامل

Machine scheduling for multitask machining

Multitasking is an important part of today’s manufacturing plants. Multitask machine tools are capable of processing multiple operations at the same time by applying a different set of part and tool holding devices. Mill-turns are multitask machines with the ability to perform a variety of operations with considerable accuracy and agility. One critical factor in simultaneous machining is to cre...

متن کامل

Online Job Scheduling in Distributed Machine Learning Clusters

Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural network, multiple workers are run in parallel to train partitions of the input dataset, and update shared model parameters. In a shared cluster handling multiple tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Eurasip Journal on Wireless Communications and Networking

سال: 2023

ISSN: ['1687-1499', '1687-1472']

DOI: https://doi.org/10.1186/s13638-023-02253-4